##Introducere #Reprezentarea datelor sub forma de retea e o abordare diferita fata de modul clasic de reprezentare, cel tabelar. Acest format poate evidentia noi caracteristici ale datelor si imbunatesteste vizualizarea acestora intr-un mod semnificativ. #Pentru a realiza o analiza originala, setul de date folosit reprezinta structura unei retele de trafic cu tigari de contrabanda din Romania, datele fiind extrase dintr-un dosar penal. Astfel, utilizand metode de graph mining vom realiza o analiza asupra structurii retelei, dar si asupra rolurilor individuale ale membrilor.

Capitolul 1 - Modelarea datelor in R

#Datele au fost modelate folosind un obiect de tip reţea din librăria statnet. Legăturile dintre noduri au fost introduse folosind o lista de muchii, iar nodurile au următoarele atribute : nume, nume abreviat si rol.

knitr::opts_chunk$set(echo = FALSE)
library(statnet)
## Loading required package: tergm
## Loading required package: ergm
## Loading required package: network
## network: Classes for Relational Data
## Version 1.16.1 created on 2020-10-06.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##                     Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
##                     Martina Morris, University of Washington
##                     Skye Bender-deMoll, University of Washington
##  For citation information, type citation("network").
##  Type help("network-package") to get started.
## 
## ergm: version 3.11.0, created on 2020-10-14
## Copyright (c) 2020, Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
##                     Carter T. Butts, University of California -- Irvine
##                     Steven M. Goodreau, University of Washington
##                     Pavel N. Krivitsky, UNSW Sydney
##                     Martina Morris, University of Washington
##                     with contributions from
##                     Li Wang
##                     Kirk Li, University of Washington
##                     Skye Bender-deMoll, University of Washington
##                     Chad Klumb
##                     Michał Bojanowski, Kozminski University
##                     Ben Bolker
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("ergm").
## NOTE: Versions before 3.6.1 had a bug in the implementation of the bd()
## constraint which distorted the sampled distribution somewhat. In
## addition, Sampson's Monks datasets had mislabeled vertices. See the
## NEWS and the documentation for more details.
## NOTE: Some common term arguments pertaining to vertex attribute and
## level selection have changed in 3.10.0. See terms help for more
## details. Use 'options(ergm.term=list(version="3.9.4"))' to use old
## behavior.
## Loading required package: networkDynamic
## 
## networkDynamic: version 0.10.1, created on 2020-01-16
## Copyright (c) 2020, Carter T. Butts, University of California -- Irvine
##                     Ayn Leslie-Cook, University of Washington
##                     Pavel N. Krivitsky, University of Wollongong
##                     Skye Bender-deMoll, University of Washington
##                     with contributions from
##                     Zack Almquist, University of California -- Irvine
##                     David R. Hunter, Penn State University
##                     Li Wang
##                     Kirk Li, University of Washington
##                     Steven M. Goodreau, University of Washington
##                     Jeffrey Horner
##                     Martina Morris, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("networkDynamic").
## 
## tergm: version 3.7.0, created on 2020-10-15
## Copyright (c) 2020, Pavel N. Krivitsky, UNSW Sydney
##                     Mark S. Handcock, University of California -- Los Angeles
##                     with contributions from
##                     David R. Hunter, Penn State University
##                     Steven M. Goodreau, University of Washington
##                     Martina Morris, University of Washington
##                     Nicole Bohme Carnegie, New York University
##                     Carter T. Butts, University of California -- Irvine
##                     Ayn Leslie-Cook, University of Washington
##                     Skye Bender-deMoll
##                     Li Wang
##                     Kirk Li, University of Washington
##                     Chad Klumb
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("tergm").
## Loading required package: ergm.count
## 
## ergm.count: version 3.4.0, created on 2019-05-15
## Copyright (c) 2019, Pavel N. Krivitsky, University of Wollongong
##                     with contributions from
##                     Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("ergm.count").
## NOTE: The form of the term 'CMP' has been changed in version 3.2 of
## 'ergm.count'. See the news or help('CMP') for more information.
## Loading required package: sna
## Loading required package: statnet.common
## 
## Attaching package: 'statnet.common'
## The following object is masked from 'package:base':
## 
##     order
## sna: Tools for Social Network Analysis
## Version 2.6 created on 2020-10-5.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##  For citation information, type citation("sna").
##  Type help(package="sna") to get started.
## Loading required package: tsna
## 
## statnet: version 2019.6, created on 2019-06-13
## Copyright (c) 2019, Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
##                     Carter T. Butts, University of California -- Irvine
##                     Steven M. Goodreau, University of Washington
##                     Pavel N. Krivitsky, University of Wollongong
##                     Skye Bender-deMoll
##                     Martina Morris, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("statnet").
library(RColorBrewer)
library(network)
library(UserNetR)

#Pentru a ne asigura ca graful este unul ne-orientat, simetrizam matricea de adiacenta asociata primului obiect, si generam un nou obiect, pentru a lucra cu un graf ne-orientat. Acest lucru se datoreaza faptului ca relatiile sociale in cadrul acestei retele nu pot fi uni-directionale.

#Extragerea atributelor retelei in variabile separate, pentru a fi folosite ulterior in operatiile de plotare.

#Un prim plot al retelei pentru a vizualiza structura acesteia, si impartirea membrilor pe roluri.

Capitolul 2 - Analiza primara a retelei

#O prima analiza asupra retelei este realizarea rezumatului in 5 puncte. Functiile prezente in libraria statnet faciliteaza realizarea acesteia. Analizand aceste valori, putem avea o prima impresie despre structura retelei si despre modul de organizarea a acesteia.

print("BASIC CHARACTERISTICS")
## [1] "BASIC CHARACTERISTICS"
summary(netsym, print.adj = FALSE)
## Network attributes:
##   vertices = 21
##   directed = TRUE
##   hyper = FALSE
##   loops = FALSE
##   multiple = FALSE
##   bipartite = FALSE
##  total edges = 72 
##    missing edges = 0 
##    non-missing edges = 72 
##  density = 0.1714286 
## 
## Vertex attributes:
## 
##  abrev_name:
##    character valued attribute
##    attribute summary:
##    the 10 most common values are:
## BA BC BG BL CI DC DD DI DR GG 
##  1  1  1  1  1  1  1  1  1  1 
## 
##  alldeg:
##    numeric valued attribute
##    attribute summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   4.000   6.000   6.857  10.000  14.000 
## 
##  role:
##    character valued attribute
##    attribute summary:
##  A  C CR CT  D 
##  2 12  1  4  2 
##   vertex.names:
##    character valued attribute
##    21 valid vertex names
## 
## No edge attributes
print("Size:")
## [1] "Size:"
print(network.size(netsym))
## [1] 21
print("Density:")
## [1] "Density:"
print(gden(netsym))
## [1] 0.1714286
print("Components:")
## [1] "Components:"
print(components(netsym))
## [1] 1
print("Diameter:")
## [1] "Diameter:"
gd <- geodist(netsym)
print(max(gd$gdist))
## [1] 7
print("Transitivity:")
## [1] "Transitivity:"
print(gtrans(netsym, mode="graph"))
## [1] 0.25

3. Capitolul 3 - Managementul datelor atribuite unei retele

Folosind atributele definite în momentul creării, putem filtra reţeua astfel încat putem evidenţia importanţa unui anume rol. Spre exemplu, dacă am păstra în reţea doar Comercianţii, putem observa că aceştia sunt în mare partea izolaţi, distrugând aspectul de reţea compactă. Acest lucru evidenţiază rolul contrabandiştilor în reţea, aceştia asigurând practic conexitatea reţelei.

print("Filtering networks")
print(get.vertex.attribute(netsym, "role"))
comercianti <- get.inducedSubgraph(netsym, which (netsym %v% "role"=="C"))
gplot(comercianti,displaylabels=TRUE, main="Comercianti")

delete.vertices(comercianti, isolates(comercianti))
gplot(comercianti, displaylabels = TRUE, main="Grupuri de comercianti")

# 4.Basic network plotting and layout

# Circle
gplot(netsym,gmode="graph",edge.col="grey75",displaylabels=T,
      vertex.cex=1.5,mode='circle',main="circle")

Eigen

gplot(netsym,gmode="graph",edge.col="grey75",displaylabels=T,
      vertex.cex=1.5,mode='eigen',main="eigen")

Random

gplot(netsym,gmode="graph",edge.col="grey75",displaylabels=T,
      vertex.cex=1.5,mode='random',main="random")

Spring

gplot(netsym,gmode="graph",edge.col="grey75",displaylabels=T,
      vertex.cex=1.5,mode='spring',main="spring")

Fruchterman-Reingold

gplot(netsym,gmode="graph",edge.col="grey75",displaylabels=T,
      vertex.cex=1.5,mode='fruchtermanreingold',main='fruchtermanreingold')

Kamada-Kawai

gplot(netsym,gmode="graph",edge.col="grey75",displaylabels=T,
      vertex.cex=1.5,mode='kamadakawai',
      main='kamadakawai')

5.Effective network graphic design

library(network)

library(intergraph)
library(igraph)
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:sna':
## 
##     betweenness, bonpow, closeness, components, degree, dyad.census,
##     evcent, hierarchy, is.connected, neighborhood, triad.census
## The following objects are masked from 'package:network':
## 
##     %c%, %s%, add.edges, add.vertices, delete.edges, delete.vertices,
##     get.edge.attribute, get.edges, get.vertex.attribute, is.bipartite,
##     is.directed, list.edge.attributes, list.vertex.attributes,
##     set.edge.attribute, set.vertex.attribute
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
library(networkD3)
plot(netsym,vertex.cex=0.5,main="Too small nodes")

plot(netsym,vertex.cex=6,main="Too large nodes")

plot(netsym,vertex.cex=2,main="Just right node size")

Different node type

sidenum <- 3:7
rolecat <- as.factor(get.vertex.attribute(asIgraph(netsym),"role"))
plot(netsym,usearrows=FALSE,vertex.cex=4, main="Different node type",
     displaylabels=F,vertex.sides=sidenum[rolecat])

Edge coloring example

n_edge <- network.edgecount(netsym)
linecol_pal <- c("blue","red","green")
edge_cat <- sample(1:3,n_edge,replace=T)
plot(netsym,vertex.cex=1.5,vertex.col="grey25", main="Edge coloring example",
     edge.col=linecol_pal[edge_cat],edge.lwd=2)

Different edge width

widths <- c(2,6,10)
plot(netsym,vertex.cex=1.5,main="Different edge width",
     edge.lwd=1.5*widths)

Different edge type

n_edge <- network.edgecount(netsym)
edge_cat <- sample(1:3,n_edge,replace=T)
line_pal <- c(2,3,4)
gplot(netsym,vertex.cex=0.8,gmode="graph", main="Different edge type",
      vertex.col="gray50",edge.lwd=1.5,
      edge.lty=line_pal[edge_cat])

Infractional network

my_pal <- brewer.pal(5,"Dark2")
rolecat <- as.factor(get.vertex.attribute(asIgraph(netsym),"role"))
plot(netsym,
     main = "Infractional network",
     usearrows=FALSE, 
     mode="fruchtermanreingold", 
     vertex.col = my_pal[rolecat],
     label=abrevnamelab,
     displaylabels=T,
     vertex.cex = 1.5)
legend("bottomleft",legend=c("Aducator clienti","Comerciant","Cartita","Contrabandist","Depozitare"),
       col=my_pal,pch=19,pt.cex=1.5,bty="n",
       title="Criminal Role")

# necessary, caused conflicts
detach("package:statnet", unload=TRUE)

6.Advanced Network Graphics

Tkplot

inetsym <- asIgraph(netsym)
Coord <- tkplot(inetsym, vertex.size=3,
                vertex.label=V(inetsym)$role,
                vertex.color="darkgreen")
MCoords <- tkplot.getcoords(Coord)
plot(inetsym, layout=MCoords, vertex.size=5,main="Interactive tkplot",
     vertex.label=NA, vertex.color="lightblue")

# NetworkD3
inetsym_edge <- get.edgelist(inetsym)
inetsym_edge <- inetsym_edge - 1
inetsym_edge <- data.frame(inetsym_edge)
print(V(inetsym)$role)
##  [1] "C"  "C"  "C"  "CR" "C"  "C"  "CT" "CT" "CT" "C"  "C"  "A"  "A"  "C"  "C" 
## [16] "C"  "C"  "C"  "CT" "D"  "D"
inetsym_nodes <- data.frame(NodeID=as.numeric(V(inetsym)-1),
                          Group=V(inetsym)$role,
                          Nodesize=(degree(inetsym)))
net_D3 <- forceNetwork(Links = inetsym_edge, Nodes = inetsym_nodes,
             Source = "X1", Target = "X2",
             NodeID = "NodeID",Nodesize = "Nodesize",
             radiusCalculation="Math.sqrt(d.nodesize)*3",
             Group = "Group", opacity = 0.8,
             legend=TRUE)

saveNetwork(net_D3,file = 'Net_test2.html',
            selfcontained=TRUE)


#Visnetwork
library(visNetwork)
inetsym_edge <- get.edgelist(inetsym)
inetsym_edge <- data.frame(from = inetsym_edge[,1],
                         to = inetsym_edge[,2])
inetsym_nodes <- data.frame(id = as.numeric(V(inetsym)))
visNetwork(inetsym_nodes, inetsym_edge, width = "100%")
net <- visNetwork(inetsym_nodes, inetsym_edge,
                  width = "100%",legend=TRUE)
## Warning in visNetwork(inetsym_nodes, inetsym_edge, width = "100%", legend =
## TRUE): 'legend' and 'legend.width' are deprecated (visNetwork >= 0.1.2). Please
## now prefer use visLegend function.
net <- visOptions(net,highlightNearest = TRUE)
net <- visInteraction(net,navigationButtons = TRUE)
library(htmlwidgets)
## 
## Attaching package: 'htmlwidgets'
## The following object is masked from 'package:networkD3':
## 
##     JS
saveWidget(net, "Net_test3.html")

Chord diagram

library(circlize)
## ========================================
## circlize version 0.4.11
## CRAN page: https://cran.r-project.org/package=circlize
## Github page: https://github.com/jokergoo/circlize
## Documentation: https://jokergoo.github.io/circlize_book/book/
## 
## If you use it in published research, please cite:
## Gu, Z. circlize implements and enhances circular visualization
##   in R. Bioinformatics 2014.
## 
## This message can be suppressed by:
##   suppressPackageStartupMessages(library(circlize))
## ========================================
## 
## Attaching package: 'circlize'
## The following object is masked from 'package:igraph':
## 
##     degree
## The following object is masked from 'package:sna':
## 
##     degree
library(statnet)
## 
## statnet: version 2019.6, created on 2019-06-13
## Copyright (c) 2019, Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
##                     Carter T. Butts, University of California -- Irvine
##                     Steven M. Goodreau, University of Washington
##                     Pavel N. Krivitsky, University of Wollongong
##                     Skye Bender-deMoll
##                     Martina Morris, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("statnet").
sociomat <- as.sociomatrix(netsym,attrname='passes')
## Warning in as.matrix.network.adjacency(x, attrname = attrname, expand.bipartite
## = expand.bipartite, : There is no edge attribute named passes
chordDiagram(sociomat)

detach("package:statnet", unload=TRUE)
detach("package:circlize", unload=TRUE)

7. Actor proeminence

detach("package:networkD3", unload=TRUE)
detach("package:igraph", unload=TRUE)
print("CENTRALITY DEGREES")
## [1] "CENTRALITY DEGREES"
print(degree(netsym, gmode="graph"))
##  [1] 6 2 4 1 5 5 3 7 6 2 2 5 3 3 2 2 2 2 6 2 2
print(closeness(netsym, gmode="graph"))
##  [1] 0.4761905 0.3333333 0.3846154 0.3278689 0.3278689 0.3278689 0.3174603
##  [8] 0.4166667 0.4081633 0.2531646 0.2941176 0.2564103 0.3636364 0.4444444
## [15] 0.2941176 0.2941176 0.2941176 0.2941176 0.3846154 0.3076923 0.3076923
print(betweenness(netsym, gmode="graph"))
##  [1] 113.1666667   0.0000000   4.1666667   0.0000000   9.6666667   9.6666667
##  [7]   0.0000000  51.0000000  36.0000000   0.0000000   3.6000000   7.0000000
## [13]  16.6000000  96.0000000   2.8500000   2.8500000   2.8500000   2.8500000
## [19]  69.4000000   0.1666667   0.1666667
#Cutpoints
cpnet <- cutpoints(netsym,mode="graph",
                   return.indicator=TRUE)
gplot(netsym,gmode="graph",vertex.col=cpnet+2,coord=MCoords,
      jitter=FALSE,displaylabels=TRUE)

#Bridges
bridges <- function(dat,mode="graph",
                    connected=c("strong", "weak")) {
   e_cnt <- network.edgecount(dat)
   if (mode == "graph") {
      cmp_cnt <- components(dat)
      b_vec <- rep(FALSE,e_cnt)
      for(i in 1:e_cnt){
         dat2 <- dat
         delete.edges(dat2,i)
         b_vec[i] <- (components(dat2) != cmp_cnt)
      }
   }
   else {
      cmp_cnt <- components(dat,connected=connected)
      b_vec <- rep(FALSE,e_cnt)
      for(i in 1:e_cnt){
         dat2 <- dat
         delete.edges(dat2,i)
         b_vec[i] <- (components(dat2) != cmp_cnt)
      }
   }
   return(b_vec)
}
bridges(netsym)
##  [1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [13]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
colors <- c("blue", "red")

# Determining the centre nodes using the degree
deg <- degree(netsym, gmode="graph")
plot(netsym,
     usearrows=FALSE, 
     vertex.col = colors[(deg >= 5) + 1],
     label = netsym %v% "abrev_name",
     displaylabels=T,
     vertex.cex = deg/2)

# Determining the centre nodes using the closeness function
cls <- closeness(netsym, gmode="graph")
plot(netsym,
     usearrows=FALSE, 
     vertex.col = colors[(cls >= 0.33) + 1],
     label = netsym %v% "abrev_name",
     displaylabels=T,
     vertex.cex = cls*10)

# Determining the centre nodes using the betweenness function
bet <- betweenness(netsym, gmode="graph")
plot(netsym,
     usearrows=FALSE, 
     vertex.col = colors[(bet >= 90) + 1],
     label = netsym %v% "abrev_name",
     displaylabels=T,
     vertex.cex = sqrt(bet+1))

# Computing the level of correlation between multiple centrality measures
df.prom <- data.frame(
        deg = degree(netsym),
        cls = closeness(netsym),
        btw =  betweenness(netsym),
        evc = evcent(netsym),
        inf = infocent(netsym),
        flb = flowbet(netsym)
)
cor(df.prom)
##           deg       cls       btw       evc       inf       flb
## deg 1.0000000 0.6013689 0.5917256 0.6360877 0.7918289 0.5708101
## cls 0.6013689 1.0000000 0.8545112 0.4791390 0.8593100 0.8230555
## btw 0.5917256 0.8545112 1.0000000 0.2297788 0.7352932 0.9357088
## evc 0.6360877 0.4791390 0.2297788 1.0000000 0.7469055 0.3381616
## inf 0.7918289 0.8593100 0.7352932 0.7469055 1.0000000 0.7994418
## flb 0.5708101 0.8230555 0.9357088 0.3381616 0.7994418 1.0000000
# Tabular visualization for multiple centrality measures
# Defining a data frame in which is computed the centrality for all nodes using
# multiple methods
df.prom2 <- data.frame(
        name = network.vertex.names(netsym),
        degree = degree(netsym, gmode="graph"),
        closeness = closeness(netsym, gmode="graph"),
        betweenness = betweenness(netsym, gmode="graph"))
df.promsort <- df.prom2[order(-df.prom2$degree),]
cd <- centralization(netsym,degree)
cc <- centralization(netsym,closeness)
cb <- centralization(netsym,betweenness)
df.promsort <- rbind(df.promsort,data.frame(
        name = "Centralization level",
        degree = cd,
        closeness = cc,
        betweenness = cb
))
df.promsort
##                     name    degree closeness betweenness
## 8           T**a G***ghe 7.0000000 0.4166667  51.0000000
## 1          B***cu L***na 6.0000000 0.4761905 113.1666667
## 9            S**m An**la 6.0000000 0.4081633  36.0000000
## 19            D**a I***l 6.0000000 0.3846154  69.4000000
## 5            M**tu M**na 5.0000000 0.3278689   9.6666667
## 6           Ma**u I***he 5.0000000 0.3278689   9.6666667
## 12           M***u L**do 5.0000000 0.2564103   7.0000000
## 3         B**scu C***nel 4.0000000 0.3846154   4.1666667
## 7              T**a F**p 3.0000000 0.3174603   0.0000000
## 13             D**a D**a 3.0000000 0.3636364  16.6000000
## 14             D**a C**l 3.0000000 0.4444444  96.0000000
## 2         B***cu An***us 2.0000000 0.3333333   0.0000000
## 10        G**ca G****ghe 2.0000000 0.2531646   0.0000000
## 11             C**u I**n 2.0000000 0.2941176   3.6000000
## 15            N**cu P**u 2.0000000 0.2941176   2.8500000
## 16           N**se T**er 2.0000000 0.2941176   2.8500000
## 17        S***an C***tin 2.0000000 0.2941176   2.8500000
## 18           O***u A**ei 2.0000000 0.2941176   2.8500000
## 20           P**ci V***e 2.0000000 0.3076923   0.1666667
## 21          D***mir R**a 2.0000000 0.3076923   0.1666667
## 4          B**hiu G***ge 1.0000000 0.3278689   0.0000000
## 110 Centralization level 0.1973684 0.1518153   0.5127632
# Cutpoints are nodes that if removed will affect the conectivity of the network
# In the graphic below, it is displayed with green the cutpoint nodes.
cpnet <- cutpoints(netsym,mode="graph",return.indicator=TRUE)
gplot(netsym,gmode="graph",vertex.cex=cpnet+2,vertex.col=cpnet+2,jitter=FALSE,
      displaylabels=TRUE,label=netsym %v% "abrev_name")

# Bridges are edges that if removed will affect the conectivity of the network
# In the graphic below it is displayed with green the edges that are bridges.
bridges <- function(dat,mode="graph",connected=c("strong", "weak")) {
        e_cnt <- network.edgecount(dat)
        if (mode == "graph") {
                cmp_cnt <- components(dat)
                b_vec <- rep(FALSE,e_cnt)
                for(i in 1:e_cnt){
                        dat2 <- dat
                        delete.edges(dat2,i)
                        b_vec[i] <- (components(dat2) != cmp_cnt)
                }
        }
        else {
                cmp_cnt <- components(dat,connected=connected)
                b_vec <- rep(FALSE,e_cnt)
                for(i in 1:e_cnt){
                        dat2 <- dat
                        delete.edges(dat2,i)
                        b_vec[i] <- (components(dat2,connected=connected) != cmp_cnt)
                }
        }
        return (b_vec)
}
bridges(netsym)
##  [1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [13]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
brnet <- bridges(netsym)
gplot(netsym,gmode="graph",vertex.col="red",edge.col=brnet+2,jitter=FALSE,
      displaylabels=TRUE,label=netsym %v% "abrev_name",edge.lwd=3*brnet+2)

Capitolul 8: Subgrupuri

Acest capitol este dedicat descoperirii subgrupurilor dintr-o retea. Acest tip de analiza are un rol foarte important, deoarece s-a observat ca in majoritatea retelelor sociale, indiferent de dimensiunea acestora, exista o tendinta de separare a actantilor in subgrupuri puternic interconectate, iar ale caror legaturi cu membrii din afara subgrupului sunt intr-un numar relativ mic. Desi pentru o retea de dimensiuni reduse nu este necesara o analiza amanuntita pentru a descoperi grupurile din cadrul unei retele, pentru dimensiuni considerabile ale retelelor, existenta subgrupurilor nu mai este atat de evidenta, iar complexitatea metodelor necesare trebuie sa fie mai mare.

Cea mai intuitiva abordare a subgrupurilor a fost de a le lega de notiunea de coeziune sociala, adica sunt considerate ca fiind subgrupuri submultimile de noduri ce contin cat mai multe conexiuni directe. In cazul retelelor care contin o componenta ce ofera informatii despre puterea unei legaturi (spre exemplu, intr-o retea de socializare putem privi relatiile de pretenie ca avand un grad determinat de frecventa de interactiune: zilnic, saptamanal, lunar, ocazional), aceasta poate fi integrata in determinarea subgrupurilor tinand cont de faptul ca legaturile mai slabe sunt folosite mai probabil pentru a conecta doua subgrupuri diferite decat pentru a conecta doua noduri din cadrul aceluiasi subgrup.

O prima abordare bazata pe notiunea de coeziune sociala, a fost de a considera subgrup doar submultimile care au conexiune intre oricare doua noduri. Mai exact, aceasta separare reprezinta de fapt, extragerea de subgrafuri complete din reteaua initiala. Un subgrup determinat prin aceasta metoda poarta denumirea de “clique”. Mai jos sunt utilizate cateva metode pentru determinarea acestora in reteaua de traficanti. Pentru a intelege mai bine rezultatele urmatoare sunt necesare doua observatii: * putem sa consideram o dimensiune minima pentru un subgrup, de regula 3 noduri, deoarece subgrupurile de dimensiune 1 si 2 nu sunt relevante (orice nod poate fi considerat un subgrup si orice doua noduri conectate pot fi considerate un subgrup). * subgrupurile formate din patru noduri cuprind (sub forma de subgraf) subgrupuri formate din trei noduri, deci are sens sa introducem notiunea de “clique” maximal (care nu mai poate fi extins prin adaugarea altor noduri).

# Determine the maximum size of a clique
clique.number(inetsym)
## [1] 4
# Determine all the cliques
cliques(inetsym, min=3)
## [[1]]
## + 3/21 vertices, named, from dadf645:
## [1] MI TF TG
## 
## [[2]]
## + 3/21 vertices, named, from dadf645:
## [1] DD DC DI
## 
## [[3]]
## + 3/21 vertices, named, from dadf645:
## [1] BL BA BC
## 
## [[4]]
## + 3/21 vertices, named, from dadf645:
## [1] BL BC SA
## 
## [[5]]
## + 3/21 vertices, named, from dadf645:
## [1] BL BC TG
## 
## [[6]]
## + 3/21 vertices, named, from dadf645:
## [1] MM MI TF
## 
## [[7]]
## + 4/21 vertices, named, from dadf645:
## [1] MM MI TF TG
## 
## [[8]]
## + 3/21 vertices, named, from dadf645:
## [1] MM MI GG
## 
## [[9]]
## + 3/21 vertices, named, from dadf645:
## [1] MM MI SA
## 
## [[10]]
## + 3/21 vertices, named, from dadf645:
## [1] MM MI TG
## 
## [[11]]
## + 3/21 vertices, named, from dadf645:
## [1] MM TF TG
# Determine the maximal cliques
maximal.cliques(inetsym, min=3)
## [[1]]
## + 3/21 vertices, named, from dadf645:
## [1] BA BL BC
## 
## [[2]]
## + 3/21 vertices, named, from dadf645:
## [1] DC DD DI
## 
## [[3]]
## + 3/21 vertices, named, from dadf645:
## [1] GG MM MI
## 
## [[4]]
## + 4/21 vertices, named, from dadf645:
## [1] TF MM TG MI
## 
## [[5]]
## + 3/21 vertices, named, from dadf645:
## [1] MM MI SA
## 
## [[6]]
## + 3/21 vertices, named, from dadf645:
## [1] BC BL TG
## 
## [[7]]
## + 3/21 vertices, named, from dadf645:
## [1] BC BL SA
# Determine the cliques with maximum size
largest.cliques(inetsym)
## [[1]]
## + 4/21 vertices, named, from dadf645:
## [1] TG MM MI TF

In majoritatea retelelor, constrangerea de a avea legaturi intre oricare doua noduri dintr-un subgrup este prea puternica, fapt ce conduce la determinarea unor subgrupuri de dimensiuni reduse, chiar si in cadrul retelelor foarte mari. Astfel, a fost introdusa notiunea de “k-Core”, care relaxeaxa cerinta legata de numarul de legaturi pe care trebuie sa le aiba un nod cu celelalte noduri din subgrup. Concret, daca extragem un k-Core, toate nodurile trebuie sa aiba gradul cel putin egal cu k. O observatie importanta este ca o analiza “1-Core” este echivalenta cu determinarea componentelor conexe din cadrul unui graf. In cele ce urmeaza sunt analizate componentele “k-Core” din cadrul retelei de traficanti.

## For each k, determine the number of nodes that are part of a k-Core (but not a (k+1)-Core, k is maximum)
coreness <- graph.coreness(inetsym)
table(coreness)
## coreness
##  1  2  3 
##  1 13  7
# Plot the network, with each node having its maximum k on it
maxCoreness <- max(coreness)
colors <- rainbow(maxCoreness)
plot(inetsym,vertex.label=coreness,vertex.color=colors[coreness],layout=layout_with_fr)

# Gradually remove the nodes that are only part of 1-Cores and 2-Cores
i1_3 <- inetsym
i2_3 <- induced.subgraph(inetsym, vids=which(coreness > 1))
i3_3 <- induced.subgraph(inetsym, vids=which(coreness > 2))
lay <- layout.fruchterman.reingold(inetsym)
op <- par(mfrow=c(1,3),mar = c(3,0,2,0))
plot(i1_3,layout=lay,vertex.label=coreness,vertex.color=colors[coreness],main="All k-cores")
plot(i2_3,layout=lay[which(coreness > 1),],vertex.label=coreness[which(coreness > 1)],vertex.color=colors[coreness[which(coreness > 1)]],main="k-cores 2-3")
plot(i3_3,layout=lay[which(coreness > 2),],vertex.label=coreness[which(coreness > 2)],vertex.color=colors[coreness[which(coreness > 2)]],main="k-cores 3")

Pentru a putea realiza o comparatie intre mai multe moduri de a imparti o retea in subgrupuri, este necesara existenta unei metode de a stabili cat de potrivita este o impartire in subgrupuri. Astfel, a fost introdusa notiunea de modularitate. Modularitate este o statistica ce poate fi calculata efectuand scaderea dintre probabilitatea ca alegand o legatura din retea aceasta sa fie intre noduri din acelasi subgrup si aceeasi probabilitate calculata intr-o retea formata din aceleasi noduri, dar in care muchiile sunt distribuite in mod aleator (numarul de muchii din reteaua initiala este conservat). Prin urmare, vom incerca sa realizam o impartire in subgrupuri a retelei de traficanti folosind drept criteriu rolul fiecarui actant. Se poate observa la sfarsit ca acest criteriu nu este cel mai potrivit tinand cont de faptul ca modularitatea obtinuta este negativa.

## Computing the modularity of the clusterization method
V(inetsym)$group <- strtoi(V(inetsym)$role)
modularity(inetsym, V(inetsym)$group)
## [1] -0.1508488
## Visualizing the network by emphasizing the role of each node
colors <- brewer.pal(5,"Dark2")
V(inetsym)$color <- colors[strtoi(V(inetsym)$role)]
op <- par(mfrow=c(1,1))
plot(inetsym,vertex.color=V(inetsym)$color,vertex.size=20)

Pana in acest moment, metodele prezentate (“clique” si “k-Core”) exploreaza doar legaturile interne din cadrul unui subgrup, nefiind analizate tipare ale legaturilor catre exteriorul unui subgrup. De aceea, au fost dezvoltati diversi algoritmi care sa exploreze aceasta zona. In cele ce urmeaza vom folosi mai multi algoritmi de acest tip, denumiti “Community Detection”, pentru a analiza rezultatele acestora pe reteaua de traficanti, precum si pentru a compara intre ele aceste rezultate (intre unii algoritmi se poate observa o similitudine in rezultate, data si de constructia similara a acestora).

## Applying multiple community detection algorithms
cw <- cluster_walktrap(inetsym)
modularity(cw); membership(cw)
## [1] 0.4903549
## BL BA BC BG MM MI TF TG SA GG CI ML DD DC NP NT SC OA DI PV DR 
##  3  3  3  3  2  2  2  2  2  2  1  1  1  1  1  1  1  1  1  2  2
ceb <- cluster_edge_betweenness(inetsym)
modularity(ceb); membership(ceb)
## [1] 0.4903549
## BL BA BC BG MM MI TF TG SA GG CI ML DD DC NP NT SC OA DI PV DR 
##  1  1  1  1  2  2  2  2  2  2  3  3  3  3  3  3  3  3  3  2  2
cs <- cluster_spinglass(inetsym)
modularity(cs); membership(cs)
## [1] 0.4903549
## BL BA BC BG MM MI TF TG SA GG CI ML DD DC NP NT SC OA DI PV DR 
##  3  3  3  3  1  1  1  1  1  1  2  2  2  2  2  2  2  2  2  1  1
cfg <- cluster_fast_greedy(inetsym)
modularity(cfg); membership(cfg)
## [1] 0.4695216
## BL BA BC BG MM MI TF TG SA GG CI ML DD DC NP NT SC OA DI PV DR 
##  1  1  1  1  3  3  3  3  1  3  2  2  2  2  2  2  2  2  2  1  1
clp <- cluster_label_prop(inetsym)
modularity(clp); membership(clp)
## [1] 0.4548611
## BL BA BC BG MM MI TF TG SA GG CI ML DD DC NP NT SC OA DI PV DR 
##  1  1  2  1  2  2  2  2  2  2  3  3  3  3  3  3  3  3  3  2  2
cle <- cluster_leading_eigen(inetsym)
modularity(cle); membership(cle)
## [1] 0.464892
## BL BA BC BG MM MI TF TG SA GG CI ML DD DC NP NT SC OA DI PV DR 
##  1  1  1  1  3  3  3  3  1  3  2  2  2  2  2  2  2  2  2  3  3
cl <- cluster_louvain(inetsym)
modularity(cl); membership(cl)
## [1] 0.4903549
## BL BA BC BG MM MI TF TG SA GG CI ML DD DC NP NT SC OA DI PV DR 
##  3  3  3  3  1  1  1  1  1  1  2  2  2  2  2  2  2  2  2  1  1
# Comparing the community detection algorithms
table(V(inetsym)$role,membership(cw))
##    
##     1 2 3
##   1 2 0 0
##   2 6 3 3
##   3 0 0 1
##   4 1 3 0
##   5 0 2 0
compare(as.numeric(factor(V(inetsym)$role)),cw,method="adjusted.rand")
## [1] 0.02816901
compare(cw,ceb,method="adjusted.rand")
## [1] 1
compare(cw,cs,method="adjusted.rand")
## [1] 1
compare(cw,cfg,method="adjusted.rand")
## [1] 0.7075812
# Visualizing the splitting made by each method
op <- par(mfrow=c(3,2),mar=c(3,0,2,0))
plot(ceb, inetsym,vertex.label=NA,main="Edge Betweenness")
plot(cfg, inetsym,vertex.label=NA,main="Fastgreedy")
plot(clp, inetsym,vertex.label=NA,main="Label Propagation")
plot(cle, inetsym,vertex.label=NA,main="Leading Eigenvector")
plot(cs, inetsym,vertex.label=NA,main="Spinglass")
plot(cw, inetsym,vertex.label=NA,main="Walktrap")

Capitolul 9: “Affiliation Networks”

Conceptele din acest capitol nu pot fi ilustrate pe reteaua traficantilor, de aceea exemple prezentate utilizeaza un set de date ilustrat in cartea din bibliografie. Aceste notiuni sunt aplicate pe grafuri in care legaturile dintre actanti nu sunt directe, ci sunt date de faptul ca cele doua persoane sunt parte al aceluiasi grup (de exemplu: clasa, departament, oras).

Un exemplu foarte uzual pentru acest tip de retele sunt cele denumite “2-Mode Networks”. Acestea se remarca prin faptul ca nodurilor din cadrul lor reprezinta doua clase diferite. Mai jos este prezentata o retea formata din elevi si clase, fiind realizate legaturi doar intre obiecte de tipuri diferite.

C1 <- c(1,1,1,0,0,0)
C2 <- c(0,1,1,1,0,0)
C3 <- c(0,0,1,1,1,0)
C4 <- c(0,0,0,0,1,1)
aff.df <- data.frame(C1,C2,C3,C4)
row.names(aff.df) <- c("S1","S2","S3","S4","S5","S6")
aff.df
##    C1 C2 C3 C4
## S1  1  0  0  0
## S2  1  1  0  0
## S3  1  1  1  0
## S4  0  1  1  0
## S5  0  0  1  1
## S6  0  0  0  1

O consecinta a faptului ca avem doua tipuri de noduri, iar legaturile se realizeaza doar intre tipuri diferite de noduri, este ca graful obtinut este bipartit, dupa cum se poate observa si din graficul urmator.

## Plotting the affiliation network to illustrate the bipartite property
bn <- graph.incidence(aff.df)
plt.x <- c(rep(2,6),rep(4,4))
plt.y <- c(7:2,6:3)
lay <- as.matrix(cbind(plt.x,plt.y))
shapes <- c("circle","square")
colors <- c("blue","red")
plot(bn,vertex.color=colors[V(bn)$type+1],vertex.shape=shapes[V(bn)$type+1],
     vertex.size=10,vertex.label.degree=-pi/2,vertex.label.dist=1.2,
     vertex.label.cex=0.9,layout=lay)

Similar cu celelalte retele, si “affiliation networks” pot fi create atat pornind de la o matrice de incidenta, cat si de la o lista de muchii. In cele ce urmeaza sunt prezentate exemple pentru ambele metode.

## Creating Affiliation Networks from Incidence Matrices
bn <- graph.incidence(aff.df)
bn
## IGRAPH 689a32a UN-B 10 11 -- 
## + attr: type (v/l), name (v/c)
## + edges from 689a32a (vertex names):
##  [1] S1--C1 S2--C1 S2--C2 S3--C1 S3--C2 S3--C3 S4--C2 S4--C3 S5--C3 S5--C4
## [11] S6--C4
get.incidence(bn)
##    C1 C2 C3 C4
## S1  1  0  0  0
## S2  1  1  0  0
## S3  1  1  1  0
## S4  0  1  1  0
## S5  0  0  1  1
## S6  0  0  0  1
V(bn)$type
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
V(bn)$name
##  [1] "S1" "S2" "S3" "S4" "S5" "S6" "C1" "C2" "C3" "C4"
## Creating Affiliation Networks from Edge Lists
el.df <- data.frame(rbind(c("S1","C1"),
                          c("S2","C1"),
                          c("S2","C2"),
                          c("S3","C1"),
                          c("S3","C2"),
                          c("S3","C3"),
                          c("S4","C2"),
                          c("S4","C3"),
                          c("S5","C3"),
                          c("S5","C4"),
                          c("S6","C4")))
bn2 <- graph.data.frame(el.df,directed=FALSE)
V(bn2)$type <- V(bn2)$name %in% el.df[,1]
bn2
## IGRAPH b1078a4 UN-B 10 11 -- 
## + attr: name (v/c), type (v/l)
## + edges from b1078a4 (vertex names):
##  [1] S1--C1 S2--C1 S2--C2 S3--C1 S3--C2 S3--C3 S4--C2 S4--C3 S5--C3 S5--C4
## [11] S6--C4
graph.density(bn)==graph.density(bn2)
## [1] TRUE

O analiza interesanta ce poate fi realizata pe un “affiliation network”, este un studiu pe componente ce poate fi realizat cu ajutorul proiectiilor. O proiectie permite analiza doar unuia dintre cele doua tipuri de noduri prezente intr-o retea “2-Mode”. Legaturile dintre noduri in cadrul unei proiectii sunt date de legaturile indirecte din reteaua initiala (daca doi studenti stuadiaza in aceeasi clasa, atunci intre acestia va exista o legatura directa in proiectie). In cele ce urmeaza vom ilustra proiectiile pe reteaua de studenti si clase.

## Making the projections on both components
bn.pr <- bipartite.projection(bn)
bn.student <- bn.pr$proj1
bn.class <- bn.pr$proj2
get.adjacency(bn.student,sparse=FALSE,attr="weight")
##    S1 S2 S3 S4 S5 S6
## S1  0  1  1  0  0  0
## S2  1  0  2  1  0  0
## S3  1  2  0  2  1  0
## S4  0  1  2  0  1  0
## S5  0  0  1  1  0  1
## S6  0  0  0  0  1  0
get.adjacency(bn.class,sparse=FALSE,attr="weight")
##    C1 C2 C3 C4
## C1  0  2  1  0
## C2  2  0  2  0
## C3  1  2  0  1
## C4  0  0  1  0
## Plotting the projections
shapes <- c("circle","square")
colors <- c("blue","red")
op <- par(mfrow=c(1,2))
plot(bn.student,vertex.color="blue",vertex.shape="circle",main="Students",
     edge.width=E(bn.student)$weight*2,vertex.size=15,vertex.label.degree=-pi/2,
     vertex.label.dist=1.2,vertex.label.cex=1)
plot(bn.class,vertex.color="red",vertex.shape="square",main="Classes",
     edge.width=E(bn.student)$weight*2,vertex.size=15,vertex.label.degree=-pi/2,
     vertex.label.dist=1.2,vertex.label.cex=1)

In cele ce urmeaza, vom prezenta analiza unui set de date de tipul “2-Mode” ce cuprinde filme din perioada 1999-2014, precum si actorii care au avut cele mai importante roluri in aceste filme.

## Importing the data set
data(hwd)
h1 <- hwd

## Presenting data from the data set
h1
## IGRAPH 9cdab39 UN-B 1365 1600 -- 
## + attr: name (v/c), type (v/l), year (v/n), IMDBrating (v/n),
## | MPAArating (v/c)
## + edges from 9cdab39 (vertex names):
##  [1] Inception          --Leonardo DiCaprio   
##  [2] Inception          --Joseph Gordon-Levitt
##  [3] Inception          --Ellen Page          
##  [4] Inception          --Tom Hardy           
##  [5] Inception          --Ken Watanabe        
##  [6] Inception          --Dileep Rao          
##  [7] Inception          --Cillian Murphy      
## + ... omitted several edges
V(h1)$name[1:10]
##  [1] "Inception"                                   
##  [2] "Alice in Wonderland"                         
##  [3] "Kick-Ass"                                    
##  [4] "Toy Story 3"                                 
##  [5] "How to Train Your Dragon"                    
##  [6] "Despicable Me"                               
##  [7] "Scott Pilgrim vs. the World"                 
##  [8] "Hot Tub Time Machine"                        
##  [9] "Harry Potter and the Deathly Hallows: Part 1"
## [10] "Tangled"
V(h1)$type[1:10]
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
V(h1)$IMDBrating[1:10]
##  [1] 8.8 6.5 7.8 8.4 8.2 7.7 7.5 6.5 7.7 7.9
V(h1)$name[155:165]
##  [1] "Notting Hill"               "Eyes Wide Shut"            
##  [3] "The Green Mile"             "10 Things I Hate About You"
##  [5] "American Pie"               "Girl, Interrupted"         
##  [7] "Leonardo DiCaprio"          "Joseph Gordon-Levitt"      
##  [9] "Ellen Page"                 "Tom Hardy"                 
## [11] "Ken Watanabe"
## Plotting a part of the network
V(h1)$shape <- ifelse(V(h1)$type==TRUE,"square","circle")
V(h1)$shape[1:10]
##  [1] "square" "square" "square" "square" "square" "square" "square" "square"
##  [9] "square" "square"
V(h1)$color <- ifelse(V(h1)$type==TRUE,"red","lightblue")
h2 <- subgraph.edges(h1, E(h1)[inc(V(h1)[name %in% c("The Wolf of Wall Street", 
                                                     "Gangs of New York",
                                                     "The Departed")])])
plot(h2, layout = layout_with_kk)

Mai departe este prezentata o analiza a actorilor, cu accent pe cei care au jucat in cele mai multe filme si cei care au jucat in cele mai apreciate filme

## Basic properties of actor projection
graph.density(h1)
## [1] 0.001718711
table(degree(h1,v=V(h1)[type==FALSE]))
## 
##   1   2   3   4   5   6   7   8 
## 955 165  47  23  11   2   1   1
mean(degree(h1,v=V(h1)[type==FALSE]))
## [1] 1.327801
V(h1)$deg <- degree(h1)

## Displaying busy actors
V(h1)[type==FALSE & deg > 4]$name
##  [1] "Leonardo DiCaprio" "Emma Watson"       "Richard Griffiths"
##  [4] "Harry Melling"     "Daniel Radcliffe"  "Rupert Grint"     
##  [7] "James Franco"      "Ian McKellen"      "Martin Freeman"   
## [10] "Bradley Cooper"    "Christian Bale"    "Samuel L. Jackson"
## [13] "Natalie Portman"   "Brad Pitt"         "Liam Neeson"
busy_actor <- data.frame(cbind(Actor = V(h1)[type==FALSE & deg > 4]$name,
                               Movies = V(h1)[type==FALSE & deg > 4]$deg))
busy_actor[order(busy_actor$Movies,decreasing=TRUE),]
##                Actor Movies
## 5   Daniel Radcliffe      8
## 11    Christian Bale      7
## 1  Leonardo DiCaprio      6
## 2        Emma Watson      6
## 3  Richard Griffiths      5
## 4      Harry Melling      5
## 6       Rupert Grint      5
## 7       James Franco      5
## 8       Ian McKellen      5
## 9     Martin Freeman      5
## 10    Bradley Cooper      5
## 12 Samuel L. Jackson      5
## 13   Natalie Portman      5
## 14         Brad Pitt      5
## 15       Liam Neeson      5
## Displaying actors from best movies
for (i in 161:1365) {
  V(h1)[i]$totrating <- sum(V(h1)[nei(i)]$IMDBrating)
}
max(V(h1)$totrating,na.rm=TRUE)
## [1] 60.9
pop_actor <- data.frame(
  cbind(Actor = V(h1)[type==FALSE & totrating > 40]$name,
        Popularity = V(h1)[type==FALSE & totrating > 40]$totrating))
pop_actor[order(pop_actor$Popularity,decreasing=TRUE),]
##               Actor Popularity
## 3  Daniel Radcliffe       60.9
## 4    Christian Bale       55.5
## 1 Leonardo DiCaprio       49.6
## 2       Emma Watson         45
## 5         Brad Pitt       40.5

In plus, putem analiza si relatii de dependenta intre datele pe care le extragem. De exemplu, se poate face o regresie intre numarul de filme in care a jucat un actor si media scorului acestor filme, pentru a determina daca exista sau nu o legatura liniara intre acestea. Dupa cum se poate vedea in urma analizei urmatoare, precum si din afisarea grafica a rezultatelor, nu se poate extrage o astfel de legatura.

for (i in 161:1365) {
  V(h1)[i]$avgrating <- mean(V(h1)[nei(i)]$IMDBrating)
}
num <- V(h1)[type==FALSE]$deg  
avgpop <- V(h1)[type==FALSE]$avgrating
summary(lm(avgpop ~ num))
## 
## Call:
## lm(formula = avgpop ~ num)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9858 -0.4330  0.1977  0.6170  1.6142 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.33868    0.05440 134.911   <2e-16 ***
## num          0.04714    0.03527   1.337    0.182    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9605 on 1203 degrees of freedom
## Multiple R-squared:  0.001483,   Adjusted R-squared:  0.0006528 
## F-statistic: 1.786 on 1 and 1203 DF,  p-value: 0.1816
scatter.smooth(num,avgpop,col="lightblue",ylim=c(2,10),span=.8,
               xlab="Number of Movies",ylab="Avg. Popularity")

In ultima parte a analizei, ne vom uita la proiectiile acestei retele, in particular la cea a filmelor. Iar la aceasta vom analiza mai multe statistici, pentru ca in final sa vizualizam reteaua formata din filmele care au in comun actori cu macar alte 5 filme, din care se poate observa ca fac partea cu preponderenta serii de filme precum: Star Wars, Harry Potter, The Hobbit, etc.

## Extract projections for the network
h1.pr <- bipartite.projection(h1)
h1.act <- h1.pr$proj1
h1.mov <- h1.pr$proj2
h1.act
## IGRAPH bbf71e2 UNW- 1205 6903 -- 
## + attr: name (v/c), year (v/n), IMDBrating (v/n), MPAArating (v/c),
## | shape (v/c), color (v/c), deg (v/n), totrating (v/n), avgrating
## | (v/n), weight (e/n)
## + edges from bbf71e2 (vertex names):
##  [1] Leonardo DiCaprio--Joseph Gordon-Levitt
##  [2] Leonardo DiCaprio--Ellen Page          
##  [3] Leonardo DiCaprio--Tom Hardy           
##  [4] Leonardo DiCaprio--Ken Watanabe        
##  [5] Leonardo DiCaprio--Dileep Rao          
##  [6] Leonardo DiCaprio--Cillian Murphy      
## + ... omitted several edges
h1.mov
## IGRAPH 52bde14 UNW- 160 472 -- 
## + attr: name (v/c), year (v/n), IMDBrating (v/n), MPAArating (v/c),
## | shape (v/c), color (v/c), deg (v/n), totrating (v/n), avgrating
## | (v/n), weight (e/n)
## + edges from 52bde14 (vertex names):
##  [1] Inception--The Wolf of Wall Street    Inception--Django Unchained          
##  [3] Inception--The Departed               Inception--Gangs of New York         
##  [5] Inception--Catch Me If You Can        Inception--The Dark Knight Rises     
##  [7] Inception--10 Things I Hate About You Inception--Batman Begins             
##  [9] Inception--The Dark Knight            Inception--Training Day              
## [11] Inception--Big Fish                  
## + ... omitted several edges
## Printing movie projection
op <- par(mar = rep(0, 4))
plot(h1.mov,vertex.color="red",vertex.shape="circle",
     vertex.size=(V(h1.mov)$IMDBrating)-3,vertex.label=NA)

## Extracting and plotting the biggest component from movie projection
graph.density(h1.mov)
## [1] 0.03710692
no.clusters(h1.mov)
## [1] 12
clusters(h1.mov)$csize
##  [1] 148   1   1   1   1   1   1   2   1   1   1   1
table(E(h1.mov)$weight)
## 
##   1   2   3   4   5   6   7  10 
## 411  21  12  16   6   1   2   3
h2.mov <- induced.subgraph(h1.mov, vids=clusters(h1.mov)$membership==1)
plot(h2.mov,vertex.color="red",edge.width=sqrt(E(h1.mov)$weight),
     vertex.shape="circle",vertex.size=(V(h2.mov)$IMDBrating)-3,vertex.label=NA)

table(graph.coreness(h2.mov))
## 
##  1  2  3  4  5  6  7 
## 11  5 23 65 29  7  8
## Extracting and plotting only the nodes with coreness bigger than 4
h3.mov <- induced.subgraph(h2.mov,vids=graph.coreness(h2.mov)>4)
h3.mov
## IGRAPH dd22b04 UNW- 44 158 -- 
## + attr: name (v/c), year (v/n), IMDBrating (v/n), MPAArating (v/c),
## | shape (v/c), color (v/c), deg (v/n), totrating (v/n), avgrating
## | (v/n), weight (e/n)
## + edges from dd22b04 (vertex names):
## [1] Alice in Wonderland                      --Hot Tub Time Machine               
## [2] Hot Tub Time Machine                     --The Interview                      
## [3] The Hobbit: The Battle of the Five Armies--The Hobbit: The Desolation of Smaug
## [4] Inception                                --The Wolf of Wall Street            
## [5] Exodus: Gods and Kings                   --American Hustle                    
## + ... omitted several edges
plot(h3.mov,vertex.color="red",vertex.shape="circle",
     edge.width=sqrt(E(h1.mov)$weight),vertex.label.cex=0.7,
     vertex.label.color="darkgreen",vertex.label.dist=0.3,
     vertex.size=(V(h3.mov)$IMDBrating)-3)

Capitolul 10: Modele de retele aleatoare

In analiza unei retele moderne sociale se desting urmatoarele caracterisici:

  1. Este motivata de intelegerea intuitiva a structurii retelei bazata pe legaturile dintre actanti

  2. Are la baza date empirice extrase intr-un mod sistematic

  3. Se bazeaza foarte mult pe vizualizarea datelor utilizand metode grafice

  4. Depinde de utilizarea modelelor matematice si computationale

Analizele prezentate pana in acest moment au pus accent pe primele trei concepte, iar in cele ce urmeaza il vom aborda pe cel din urma. Mai precis, vom incerca in continuare sa utilizam diferite modele pentru a genera o retea care sa aiba caracteristici similare cu reteaua initiala (cea a traficantilor).

O prima abordare luata in considerare, este propusa de Paul Erdos si Alfred Renyi. In aceasta metoda, reteaua este generata pornind de la un specificat de noduri si adaugand, in mod aleator, un numar specificat de muchii. Datorita acestei constructii, acest model este referit si sub denumirea de “random graph model”. Alternativ, pentru constructia unui astfel de model se poate specifica in locul numarului de muchii, o probabilitate de a avea muchie intre doua noduri ale retelei. In continuare, este prezentat codul pentru generarea retelei utilizand cele doua metode si vizualizarea grafica a rezultatelor comparativ cu reteaua initiala.

## Generate a similar network using Erdos-Renyi method (by specify the number of edges)
no_nodes <- length(V(inetsym))
no_edges <- length(E(inetsym))
er_net1 <- erdos.renyi.game(n=no_nodes,no_edges,type='gnm')

## Generate a similar network using Erdos-Renyi method (by specify the probability of having an edge between two nodes)
edge_prob <- no_edges / ((no_nodes-1)*no_nodes)
er_net2 <- erdos.renyi.game(n=no_nodes, edge_prob,type='gnp')

op <- par(mfrow=c(1,3))
plot(inetsym,vertex.label=NA,vertex.size=10)
plot(er_net1, vertex.label=NA, vertex.size=10)
plot(er_net2, vertex.label=NA, vertex.size=10)

Modelul prezentat anterior este unul simplu de inteles, insa din cauza generalitatii sale, nu se pot genera retele care sa aiba anumite particularitati dorite (care sunt observate la o retea obtinuta empiric). Un astfel de exemplu, este faptul ca desi dimensiunile retelelor sociale pot fi de ordinul milionelor de noduri, diametrul acestora este considerabil mai mic, fapt ce nu este foarte usor de replicat in modelul anterior. De aceea modelul “Small-World” doreste sa imbunatateasca acest lucru pornind de la o configuratie prestabilita a muchiilor retelei, urmata de o modificarea a acestora cu o anumita probabilitate. Se poate observa ca folosind probabilitatea de modificare a muchiilor cu valoare 1, configuratia initiala nu va mai conta, iar modelul va fi similar cu cel prezentat anterior. Inainte de a prezenta codul pentru generarea acestui model, vom explica cum arata configuratia initiala a retelei. Nodurile sunt asezate in cerc, iar fiecare nod este legat de k dintre cei mai apropiati vecini ai sai (atat cei din dreapta, cat si cei din stanga). In exemplu urmator vom determina acest k, tinand cont de gradul mediu al unui nod, iar pentru probabilitatea de modificarea a unei muchii vom utiliza trei valori diferite ale parametrului (0.25, 0.5, 0.75). Analizand grafic cat de similare sunt aceste retele comparativ cu cea initiala.

## Generating a similar network using Small-World Model
avg_degree <- 2*no_edges/no_nodes
ws_net1 <- watts.strogatz.game(dim=1, size=no_nodes, nei=avg_degree/2, p=.25)
ws_net2 <- watts.strogatz.game(dim=1, size=no_nodes, nei=avg_degree/2, p=.5)
ws_net3 <- watts.strogatz.game(dim=1, size=no_nodes, nei=avg_degree/2, p=.75)

## Visualizing the results
op <- par(mfrow=c(2,2))
plot(inetsym,vertex.label=NA,vertex.size=10)
plot(ws_net1, vertex.label=NA, vertex.size=10)
plot(ws_net2, vertex.label=NA, vertex.size=10)
plot(ws_net3, vertex.label=NA, vertex.size=10)

Din rezultatele grafice, se poate observa ca nici acest model nu este unul multumitor. In plus, o observatie importanta pentru cele doua modele este ca nici unul nu foloseste proprietati ale gradelor nodurilor din retea. Intr-o retea empirica, gradele nodurilor nu au o distributie apropiata de medie, existand multe noduri cu grad mic, dar si noduri cu grad considerabil mai mare decat medie (intr-o retea sociala, aceste noduri ar fi persoanele celebre din diferite domenii: actori, sportivi, politicieni, etc.). Aceasta problema este adresata in modelul de generare “Scale-Free” utilizat mai jos.

## Generate a similar network using Scale-Free Model
b_net <- barabasi.game(no_nodes, directed=FALSE)

## Visualize the result
op <- par(mfrow=c(1,2))
plot(inetsym,vertex.label=NA, vertex.size=10)
plot(b_net,vertex.label=NA, vertex.size=10)

In finalul acestui capitol, sunt prezentate statistici comparative ale retelelor generate si rezultatele observate in cadrul retelei empirice.

## Comparing random models with the empirical network
list_network <- c(er_net1, ws_net2, b_net, inetsym)
comparison_table <- data.frame(
  Name = c("Erdos-Renyi", "Small world", "Scale-free model", "Empiric network"),
  Size = c(length(V(er_net1)), length(V(ws_net2)), length(V(b_net)), length(V(inetsym))),
  Density = c(gden(asNetwork(er_net1)),gden(asNetwork(ws_net2)),gden(asNetwork(b_net)),gden(asNetwork(inetsym))),
  Avg_Degree = c(length(E(er_net1))/length(V(er_net1)),length(E(ws_net2))/length(V(ws_net2)),length(E(b_net))/length(V(b_net)),length(E(inetsym))/length(V(inetsym))),
  Transitivity = c(transitivity(er_net1), transitivity(ws_net2), transitivity(b_net), transitivity(inetsym)),
  Isolates = c(sum(degree(er_net1)==0),sum(degree(ws_net2)==0),sum(degree(b_net)==0),sum(degree(inetsym)==0))
)
comparison_table
##               Name Size   Density Avg_Degree Transitivity Isolates
## 1      Erdos-Renyi   21 0.1714286   1.714286    0.2340426        2
## 2      Small world   21 0.1000000   1.000000    0.0000000        3
## 3 Scale-free model   21 0.0952381   0.952381    0.0000000        0
## 4  Empiric network   21 0.1714286   1.714286    0.2500000        0